Dynamic cubing for hierarchical multidimensional data space
نویسندگان
چکیده
Data warehouses are being used in many applications since quite a long time. Traditionally, new data in these warehouses is loaded through offline bulk updates which implies that latest data is not always available for analysis. This, however, is not acceptable in many modern applications (such as intelligent building, smart grid etc.) that require the latest data for decision making. These modern applications necessitate real-time fast atomic integration of incoming facts in data warehouse. Moreover, the data defining the analysis dimensions, stored in dimension tables of these warehouses, also needs to be updated in real-time, in case of any change. In this thesis, such real-time data warehouses are defined as dynamic data warehouses. We propose a data model for these dynamic data warehouses and present the concept of Hierarchical Hybrid Multidimensional Data Space (HHMDS) which constitutes of both ordered and non-ordered hierarchical dimensions. The axes of the data space are non-ordered which help their dynamic evolution without any need of reordering. We define a data grouping structure, called Minimum Bounding Space (MBS), that helps efficient data partitioning of data in the space. Various operators, relations and metrics are defined which are used for the optimization of these data partitions and the analogies among classical OLAP concepts and the HHMDS are defined. We propose efficient algorithms to store summarized or detailed data, in form of MBS, in a tree structure called DyTree. Algorithms for OLAP queries over the DyTree are also detailed. The nodes of DyTree, holding MBS with associated aggregated measure values, represent materialized sections of cuboids and tree as a whole is a partially materialized and indexed data cube which is maintained using online atomic incremental updates. We propose a methodology to experimentally evaluate partial data cubing techniques and a prototype implementing this methodology is developed. The prototype lets us experimentally evaluate and simulate the structure and performance of the DyTree against other solutions. An extensive study is conducted using this prototype which shows that the DyTree is an efficient and effective partial data cubing solution for a dynamic data warehousing environment. Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2013ISAL0011/these.pdf © [U. Ahmed], [2012], INSA de Lyon, tous droits réservés Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2013ISAL0011/these.pdf © [U. Ahmed], [2012], INSA de Lyon, tous droits réservés
منابع مشابه
High-dimensional Hierarchical Olap : a Prefix– Index Hierarchical Cubing Approach
The pre-computation of data cubes is critical for improving the response time of OLAP(online analytical processing) systems and accelerating data mining tasks in large data warehouses. However, as the sizes of data warehouses grow, the time it takes to perform this pre-computation becomes a significant performance bottleneck. In a high dimensional OLAP, it might not be practical to build all th...
متن کاملMDAG-Cubing: A Reduced Star-Cubing Approach
In this paper, we extend the Star-Cubing approach by introducing a new hybrid dimension-based approach to efficiently compute full or iceberg cubes with simple or complex measures. This new approach, named Multidimensional Direct Acyclic Graph Cubing (MDAG-Cubing), introduces the notion of external and internal nodes to reduce the cube representation without loss of generality. The reduced repr...
متن کاملMining Top-K Multidimensional Gradients
Several business applications such as marketing basket analysis, clickstream analysis, fraud detection and churning migration analysis demand gradient data analysis. By employing gradient data analysis one is able to identify trends, outliers and answering “what-if” questions over large databases. Gradient queries were first introduced by Imielinski et al [1] as the cubegrade problem. The main ...
متن کاملAnalysis of Hierarchical Bayesian Models for Large Space Time Data of the Housing Prices in Tehran
Housing price data is correlated to their location in different neighborhoods and their correlation is type of spatial (location). The price of housing is varius in different months, so they also have a time correlation. Spatio-temporal models are used to analyze this type of the data. An important purpose of reviewing this type of the data is to fit a suitable model for the spatial-temporal an...
متن کاملTrajectory tracking of under-actuated nonlinear dynamic robots: Adaptive fuzzy hierarchical terminal sliding-mode control
In recent years, underactuated nonlinear dynamic systems trajectory tracking, such as space robots and manipulators with structural flexibility, has become a major field of interest due to the complexity and high computational load of these systems. Hierarchical sliding mode control has been investigated recently for these systems; however, the instability phenomena will possibly occur, especia...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Decision Systems
دوره 23 شماره
صفحات -
تاریخ انتشار 2014